AITopics | text-to-video generation model

Collaborating Authors

text-to-video generation model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix MiraData A Large Scale Video with Long Durations and Structured Captions

Neural Information Processing SystemsFeb-13-2026, 21:40:54 GMT

We list the filtering criteria in Tab. 1.

artificial intelligence, machine learning, video, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Hong Kong (0.04)

Industry: Media (0.95)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Appendix MiraData A Large Scale Video with Long Durations and Structured Captions

Neural Information Processing SystemsOct-10-2025, 03:15:43 GMT

We list the filtering criteria in Tab. 1.

caption, miradata, video, (13 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > China > Hong Kong (0.04)

Industry: Media (0.95)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Gender Bias in Text-to-Video Generation Models: A case study of Sora

Nadeem, Mohammad, Sohail, Shahab Saquib, Cambria, Erik, Schuller, Björn W., Hussain, Amir

arXiv.org Artificial IntelligenceJan-10-2025

The advent of AI-generated content (AIGC) has spurred extensive scholarly research and revolutionized industries such as content generation [3,4], medical imaging [5,6], etc. Significant milestones, such as OpenAI's release of ChatGPT in 2023, have propelled the field toward the ambitious goal of Artificial General Intelligence (AGI). Among major Generative AI tools, Text-to-video (T2V) generation models have gained immense popularity due to their ability to create visually compelling and contextually accurate videos from textual descriptions [7]. Leveraging breakthroughs in Generative AI, T2V models like OpenAI's Sora [8] have showcased unprecedented capabilities in blending textual input with dynamic video output, transforming visual storytelling, advertising, and content creation. Generative AI models often inherit and amplify social biases and stereotypes embedded in their training data [9,10]. The training data, sourced from diverse and extensive internet repositories, frequently reflects cultural prejudices, societal inequities, and skewed portrayals of different demographics [15].

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.01987

Country:

Europe (0.47)
Asia > India (0.29)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation

Cho, Joseph, Puspitasari, Fachrina Dewi, Zheng, Sheng, Zheng, Jingyao, Lee, Lik-Hang, Kim, Tae-Ho, Hong, Choong Seon, Zhang, Chaoning

arXiv.org Artificial IntelligenceJun-7-2024

The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI).

arxiv preprint arxiv, text-to-video generation model, video, (11 more...)

arXiv.org Artificial Intelligence

2403.05131

Country:

Asia > South Korea (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > North Carolina (0.04)
(7 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media (1.00)
Information Technology (1.00)
Leisure & Entertainment (0.92)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback